home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Power Programmierung
/
Power-Programmierung CD 2 (Tewi)(1994).iso
/
doc
/
mir
/
1aglossa
< prev
next >
Wrap
Text File
|
1992-06-29
|
21KB
|
615 lines
════════════════════════════════
10. GLOSSARY AND INDEX
════════════════════════════════
Each item is indexed by topic and section number. The
first reference is to the topic and section in which the item is
explained or first discussed. The index indicates other notable
places in which the item is mentioned. For example, "accented
characters" below are discussed in some detail in topic 4, section
8. Section 10 of topic 4 and four sections within topic 5 touch on
accented characters further.
≡≡≡≡->> QUESTION:
What terms or expressions are used in Tutorial ONE that
puzzle you or that should be included in this glossary
and index?
<<-≡≡≡≡
═════
A
═════
accented characters
4.8 indicated in DOS with high-bit-set characters
4.10 distribution frequencies
5.1 displayed in MIR program HEAD
5.3 displayed in MIR program F_PRINT
5.4 displayed in MIR program DUMP
5.5 displayed in MIR program FRAGMENT
ANSI C (American National Standards Institute)
2.3 The ANSI draft standard for the C programming
language proposes a basic set of functions and
characteristics; adhering to ANSI C is the best
way to assure maximum portability of C programs
ASCII (The American Standard Code for Information Interchange)
4.8 an agreed-upon assignment of bit patterns to
letters, digits, punctuation, control characters
etc. Mentioned in 64 other sections.
ASCII text
4.8 a file made up of the printable subset of ASCII,
entered from a normal keyboard and displayable on
terminals in ASCII-based operating systems such as
DOS and UNIX
4.10 byte distribution in example of ASCII text file
5.1 MIR program HEAD reports if file not ASCII text
5.3 MIR program F_PRINT extracts ASCII text
6.1 programs for analyzing ASCII text
7.1 fixed length ASCII text records
9.2 MIR program F_TRAIL to remove trailing blanks
ASCII text, extended
4.8 to the printable set of ASCII characters, the
extended set adds accented characters commonly
found in various European languages
A_BYTES
4.7 MIR program to analyze the distribution of byte
frequencies within any file
4.8 and analysis of data types
4.9 and analysis of data presentation
4.10 worked example on extended ASCII with markup
5.5 use locations to examine context
7.1 recognizing printable subsets
A_LEN
6.1 MIR program to analyze the distribution of line
lengths up to 1024 bytes within any file.
A_OCCUR
5.8 MIR program to count the frequency of occurrence
of identical lines in ASCII text
A_OCCUR2
5.8 MIR program to calculate cumulative frequency of
merged A_OCCUR outputs
A_OCCUR3
5.8 MIR program to reverse an A_OCCUR file back to
repeated lines of ASCII text
A_PATTRN
5.6 MIR program to list every occurrence of a key
character or string in a file
5.7 the power of sorting A_PATTRN outputs
═════
B
═════
batch file
4.5 a text file in DOS containing an orderly series of
commands, each of which runs a program or process
as part of a larger task
BCD (Binary Coded Decimal)
4.8 a set of codes in which a combination of 4 bits is
assigned each digit 0 through 9 (0000, 0001, ...
1001); each 8 bit byte can hold two BCD digits
7.4 used within EBCDIC COBOL records for packing
bit
2.2 the smallest measure of computer memory; a single
off/on characteristic that is interpreted as a
zero or a one. A series of bits can be mapped to
binary arithmetic. Example... 10110 =
1 X 2 to the fourth power (1 X 16) +
0 X 2 to the third power (0 X 8) +
1 X 2 squared (1 X 4) +
1 X 2 to the power 1 (1 X 2) +
0 X 2 to the power zero (0 X 1)
which is decimal 22.
blocked records
4.9 a method of data presentation in which successive
records are grouped in a logical consistent manner
for convenience of reading, writing or storage
9. topic on how to deblock records
BPI (bits per inch)
a measure of the quantity of information held on
magnetic tape; normal measures are 1600 and 6250
BPI
byte
8 bits; one byte can represent 256 different values
byte stream
4.9 the crudest form of file; sequence of bytes which
a program reads sequentially and manipulates
according to content rather than according to
position within the file
6.4 contrast to hierarchical text
═════
C
═════
C language
"a general purpose programming language which
features economy of expression, modern control
flow and data structures, and a rich set of
operators" (Kernighan and Ritchie, The C
Programming Language, page ix), in which source
code requires little or no adaptation to be used
on a wide variety of computers
Canada
the home of the GST (Grab and Squander Tax) and
the place where cold weather comes from; a country
in which natives huddle in their igloos and write
superlative software in vain attempt to stay warm
CD-ROM (Compact Disc Read Only Memory)
a computer optical storage medium, closely related
to the compact discs used for music, holding 660
million bytes of data, with random access to any
point on the disc in less than two seconds
COBOL (COmmon Business-Oriented Language)
a computer programming language favored in
commercial applications in the 1960s and later,
particularly in mainframe (large computer)
installations
COLRM
5.8 MIR program to remove a specified range of columns
from each line of an ASCII text file.
7.3 extracting a single field from a file consisting
of fixed length ASCII records
compiler
2.3 computer program used to translate source code
into a machine language program, suitable for
executing on compatible computers with the same
operating system
CompuServe Information System
an electronic information and communication system
with over 900,000 subscribers, widely used for
electronic mail; sometimes abbreviated CIS or CI$;
CompuServe is a registered trademark of
CompuServe, Inc.
concatenate
4.7 to link together, as in a chain; to place several
text files one after another within a combined
file
copyleft
refers to the Free Software Foundation GNU General
Public License in which persons receiving source
code can do almost anything with it except put in
under copyright or patent
CPB
4.6 MIR program to copy any portion of any file to a
new file
5.5 use to get a more detailed, but less convenient,
display than that produced by FRAGMENT
═════
D
═════
DEBLOC_A
9.4 MIR program to remove blocking and insert line
feeds in a variable length blocked ASCII text file
DEBLOC_B
9.5 MIR program to deblock two level binary blocked
files
DIR
a DOS command to list files and their sizes within
a directory
DOS (Disk Operating System)
the most widely used operating system for IBM
compatible personal computers; MS-DOS is a
registered trademark of Microsoft Corporation
DOS executable form
2.3 selected for widest spectrum of potential users
4.5 program in PC compatible machine language ready
for use in a MS DOS or PC DOS environment
DOSIFY
5.2 MIR program to replace a UNIX-style text file with
a DOS version in which each line feed is preceded
by one carriage return, and the file ends with one
CTL-Z byte
DUMP
5.4 MIR program to list the contents of a specified
portion of any file, reporting 16 bytes per line
in hexadecimal and (where feasible) printable form
5.5 detailed way to display context at a location
8.2 use to examine file signatures
8.4 use to verify binary blocking
═════
E
═════
EBCDIC (Extended Binary Coded Decimal Interchange Code)
4.8 an agreed-upon assignment of bit patterns to
letters, digits, punctuation, control characters;
an alternate to ASCII, common on IBM mainframes
7.4 may need to re-convert to identify packed values
9.5 DEBLOC_B program
EBC_ASC
4.8 MIR program to convert an EBCDIC file to ASCII
9.5 distorts binary values when converting files
═════
F
═════
field
4.1 unit of data that takes on meaning according to
location or an identifying code; examples...
purchase order number, street address, quantity,
cost per unit, etc.
5.6, 5.7 recognizing field separators
6.5 fielded variable length text
6.6 sequence of data within a field as an analysis aid
7.2 field layouts
7.3 extracting a single field from fixed length data
fixed length records
4.9 a file consists entirely of equal size segments,
and within each segment, fields have specific byte
range assignments which do not vary from one
record to the next
7. topic on worked examples of fixed length records
8.5 binary data within fixed length records
9.3 deblocking fixed length records
FORtran (FORmula TRANslation)
a procedure oriented programming language
developed in the 1950s for solving problems in
mathematics, science and engineering; Fortran is
still in use
FRAGMENT
5.5 MIR program to display a five line fragment of a
file in printable form, providing a quick view of
context
F_PRINT
5.3 MIR program to filter/reduce a file to printable
characters only
F_TRAIL
9.2 MIR program to remove trailing blanks from lines
of ASCII text
═════
G
═════
gigabyte
1,073,742,824 characters of data
GNU (GNU's Not UNIX)
a recursive acronym for the Free Software
Foundation's alternative to the UNIX operating
system; a diabolical threat to mental health if
one is asked too frequently: "What's GNU?"
═════
H
═════
hard copy
4.4, 7.4 data printed on paper an aid to analysis
hardware
2.3 the physical components of a computer (case, disk
drives, boards, chips, etc.) and its peripheral
equipment (printer, external drives, terminal,
cables, etc.); what you can see, feel, hear, and
(when the terminal has been on too many hours)
smell
HEAD
5.1 MIR program to display lines at the beginning or
end of a text file
5.2 use to recognize non-DOS text
hexadecimal notation
4.7 Arithmetic to the base 16; the rightmost digit in
an octal number is a multiple of 16 to the power 0
(i.e., 1), the next digit 16 to the power 1, the
third digit from the right 16 to the power 2, etc.
The hexadecimal digits are 0 1 2 3 4 5 6 7 8 9 A B
C D E and F. Example: hexadecimal 6D is 6 X 16
plus 13 X 1 which in decimal arithmetic is 109 and
in ASCII code is the letter 'm'. The 256 possible
values in one byte are hexadecimal 00 through FF.
Note one hexadecimal digit represents 4 bits.
5.4 output from DUMP program
5.6 output from A_PATTRN program when /x argument used
HEX_BIN
8.4 MIR program to create test files with any
combination of printable and binary characters
high-bit-set
4.8 the first of eight bits in a byte is turned on
4.9 bytes show up in binary length blocked records
4.10 used in DOS for accented characters
homonyms
words of different meaning which share the same
spelling (a significant problem in indexing)
═════
I
═════
IBM
registered trademark of International Business
Machines Corporation
ISO 9660
Standard controlling the headers and file
references on CD-ROM that permits any computer
program written to standard to access files in
conforming CD-ROM readers of any manufacturer; ISO
= International Standards Organization
═════
J
═════
═════
K
═════
═════
L
═════
line records
4.9 segments of text padded to a fixed length
9.2 reducing line records
LINES
6.1 MIR program to provide a quick count of the number
of lines in each of one or more text files
LINE_NUM
6.1 MIR program to assign a sequence number to each
line in a text file
═════
M
═════
markup codes
4.8 embedded signals which direct how data should be
displayed
3.6 and standards; and SGML
6.2 ASCII markup patterns
6.3 Standard Generalized Markup Language
8.1 binary markup
media
alternate methods of storing data so that it may
be entered readily into computer memory; examples
are hard disk, floppy diskette, optical disk,
magnetic tape, laser card, punched card, punched
tape
media independent
describes a technique in which the selection of
data storage technology has no bearing
MIR (Mass Indexing and Retrieval)
project whose output is a set of tutorials, plus
extensive C language source code under copyleft
rules, aimed at enabling technical people to write
or adapt software leading to high speed retrieval
in any size database
mouse
2.1 a hand operated device to point to objects or text
on a computer screen; a mouse-click on an object
or piece of text acts as a command to a program
═════
N
═════
NEWLINES
7.1 MIR program to insert carriage returns and line
feeds at regular intervals, to deblock data
received in line blocks
7.3 use to extract a field from a fixed length ASCII
text file
9.2 use to deblock line records
═════
O
═════
octal notation
4.7 Arithmetic to the base 8; the rightmost digit in
an octal number is a multiple of 8 to the power 0
(i.e., 1), the next digit 8 to the power 1, the
third digit from the right 8 to the power 2, etc.
Example: octal 376 is 3 X 64 plus 7 X 8 plus 6 X 1
which in decimal arithmetic is 254. The 256
possible values in one byte are octal 000 through
377. Note one octal represents 3 bits.
7.3 used by the UNIX utility TR
OCR (Optical Character Recognition)
3.5 computer software and a scanning device interact
to convert text on paper into machine-readable
form
3.7 human checking for validity
open architecture
describes hardware and software in which the
technical detail is made generally available
operating system
2.3 the software and data that initiates, coordinates
and directs the components of a computer; serves
as an intermediary between the user's programs and
the computer hardware
═════
P
═════
preprocessing
the use of a wide variety of techniques to bring
data into a standardized form; used in MIR in
preparation for automated indexing
P_FIXED
9.3 MIR program to convert a fixed record length file
to ASCII with field numbers
P_MARC
9.5 Program source code, untested, to deblock MARC
library records
═════
Q
═════
═════
R
═════
RAM (Random Access Memory)
2.2 making do with little high speed memory
8.6 use of RAM in decompression
reboot
2.1 restart a computer by pressing a reset button or
(on a PC compatible) by pressing the three keys
CTL-ALT-DEL at the same time; an inelegant way to
escape from a badly written computer program
REPLACE1
7.3 table-driven MIR program to replace every byte in
an input file with exactly one alternate byte
(passing reference; full write-up in Tutorial TWO)
═════
S
═════
SFQL (Structured Full text Query Language)
proposed standard to enable "interoperability" of
CD-ROMs and software interfaces by different
vendors
SGML (Standard Generalized Markup Language)
6.3 introduction to SGML
3.6 user control over format
SORT2
5.8 MIR program to sort large text files using the
memory-bound DOS SORT routine in multiple passes
source code
2. the form in which computer programs are normally
written and changed, in a "language" which a
compiler program can translate into machine
language for high speed use; without access to
source code it is very difficult to make changes
to a program to accommodate it to new needs
stdin (standard input)
instead of taking data from a named file, a
program receives data directly from another
program or from a terminal; risky in DOS for non-
text files
stdout (standard output)
the result of a program is sent to another program
or to a terminal; risky in DOS for non-text files
═════
T
═════
═════
U
═════
UNIX
a computer operating system and trademark of Bell
Laboratories
═════
V
═════
═════
W
═════
WordPerfect
the word processor used to create the topics on
the MIR diskettes; WordPerfect is a registered
trademark of WordPerfect Corporation
8.3 converting a file to ASCII
WYSIWYG (What You See Is What You Get)
6.2 the simplest form of text file
6.3 untagged SGML
8.3 WordPerfect ASCII conversion
═════
X
═════
═════
Y
═════
═════
Z
═════